Add benchmarks for indexing and searching by JJ-Pineda · Pull Request #109 · rdkit-rs/cheminee

JJ-Pineda · 2024-09-23T17:19:18Z

Description
Resolves #108 by adding benchmarks for indexing and searching.

We now have added benchmarks for most of the core functionality used for indexing and searching:

Going from a SMILES to a Tantivy doc
Compounds processing (e.g. standardization)
Structure matching (e.g. exact matching and substruct matching)

These benchmarks make it clear, for example, that our molecular standardization step is comparatively more computationally intensive than most of the other functionality.

xrl · 2024-09-25T17:03:31Z

benches/indexing_benches.rs

+        let _ = writer.add_document(doc).unwrap();
+    });
+
+    let _ = writer.commit();


I think this is a tough thing to benchmark. The I/O really happens in the commit, where any remaining buffered docs are written to disk. I wouldn't try benchmarking I/O, I would stick to, say, searching through docs that are buffered in memory.

Imagine your service going this way:

Fill the index with juicy molecules

Cut down size of index based on atom count, etc

Load the top 1000 matching docs in to cheminee's memory

Rank those 1000 matching docs using search routine $XYZ

Return the top N hits

I think you want to create the output vec from step 2 in static memory, somehow, and then you just want to benchmark the serach routine of step 3 and assert the stable sort going in to step 4.

Fake 1/2, benchmark 3, assert 4 is what you expect. If you try to benchmark the actual I/O to get through 1/2 you will find it's highly variable and does not give a reliable picture.

xrl · 2024-09-25T17:04:14Z

benches/search_benches.rs

+use std::collections::{HashMap, HashSet};
+use std::ops::Deref;
+use tantivy::schema::Field;
+use test::Bencher;


This include block is getting unwieldly, I'll do a future PR to reformat our project. Just a note to myself here...

xrl · 2024-09-25T17:04:39Z

benches/search_benches.rs

+        let searcher = reader.searcher();
+        let results = basic_search(&searcher, &query, 100).unwrap();
+        let _final_results = aggregate_query_hits(searcher, results, &query).unwrap();
+    });


Again, I think benchmarking I/O is going to provide an inaccurate picture

src/search/identity_search.rs

…ng, and searching

JJ-Pineda · 2024-09-25T19:27:02Z

How about we just bench the core functionality? This has already been illuminating for determining the slowest bits of functionality. Standardization of molecules looks to be the slowest step by far, which I guess makes sense.

added benchmarks for indexing and searching

3227316

JJ-Pineda marked this pull request as ready for review September 23, 2024 17:22

JJ-Pineda requested a review from xrl September 23, 2024 17:22

xrl reviewed Sep 25, 2024

View reviewed changes

src/search/identity_search.rs Show resolved Hide resolved

instead bench core functionality used for compound processing, indexi…

fd21ff7

…ng, and searching

JJ-Pineda requested a review from xrl September 25, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for indexing and searching#109

Add benchmarks for indexing and searching#109
JJ-Pineda wants to merge 2 commits intomainfrom
add_benchmarks

JJ-Pineda commented Sep 23, 2024 •

edited

Loading

Uh oh!

xrl Sep 25, 2024 •

edited

Loading

Uh oh!

xrl Sep 25, 2024

Uh oh!

xrl Sep 25, 2024

Uh oh!

Uh oh!

JJ-Pineda commented Sep 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JJ-Pineda commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xrl Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xrl Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

xrl Sep 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JJ-Pineda commented Sep 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JJ-Pineda commented Sep 23, 2024 •

edited

Loading

xrl Sep 25, 2024 •

edited

Loading